229 research outputs found
Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques
Maximizing parallelism level in applications can be achieved by minimizing
overheads due to load imbalances and waiting time due to memory latencies.
Compiler optimization is one of the most effective solutions to tackle this
problem. The compiler is able to detect the data dependencies in an application
and is able to analyze the specific sections of code for parallelization
potential. However, all of these techniques provided with a compiler are
usually applied at compile time, so they rely on static analysis, which is
insufficient for achieving maximum parallelism and producing desired
application scalability. One solution to address this challenge is the use of
runtime methods. This strategy can be implemented by delaying certain amount of
code analysis to be done at runtime. In this research, we improve the parallel
application performance generated by the OP2 compiler by leveraging HPX, a C++
runtime system, to provide runtime optimizations. These optimizations include
asynchronous tasking, loop interleaving, dynamic chunk sizing, and data
prefetching. The results of the research were evaluated using an Airfoil
application which showed a 40-50% improvement in parallel performance.Comment: 18th IEEE International Workshop on Parallel and Distributed
Scientific and Engineering Computing (PDSEC 2017
Die stomatäre Reaktion von Sambucus nigra und Aegopodium podagraria in Abhängigkeit von Licht und Luftfeuchte - In-situ-Beobachtungen und Gaswechselmessungen im Freiland
Gleichzeitige mikroskopische Beobachtungen von Spaltöffnungsbewegungen und Messungen des CO2-H2O-Gaswechsels wurden an intakten Pflanzen der Arten Sambucus nigra L. und Aegopodium podagraria L. am Freilandstandort durchgeführt. Die Aufzeichnung der Reaktionen unter natürlichem Mikroklima und unter kontrollierten Licht- Luftfeuchte- und Temperaturbedingungen mündete in eine Beschreibung der Interaktion der Faktoren Luftfeuchte und Lichtintensität. Während A. podagaria bei allgemein geringer stomatärer Aktivität auf Kosten einer erhöhten Transpiration die photosynthetische Ausnutzung von sporadisch auftretenden Bestandeslichtflecken optimierte, limitierte S. nigra durch eine empfindliche Feuchtereaktion die Transpiration. Die Ergebnisse werden unter Berücksichtigung der Wuchsform und der Standortverhältnisse interpretiert
Shared memory parallelism in Modern C++ and HPX
Parallel programming remains a daunting challenge, from the struggle to
express a parallel algorithm without cluttering the underlying synchronous
logic, to describing which devices to employ in a calculation, to correctness.
Over the years, numerous solutions have arisen, many of them requiring new
programming languages, extensions to programming languages, or the addition of
pragmas. Support for these various tools and extensions is available to a
varying degree. In recent years, the C++ standards committee has worked to
refine the language features and libraries needed to support parallel
programming on a single computational node. Eventually, all major vendors and
compilers will provide robust and performant implementations of these
standards. Until then, the HPX library and runtime provides cutting edge
implementations of the standards, as well as proposed standards and extensions.
Because of these advances, it is now possible to write high performance
parallel code without custom extensions to C++. We provide an overview of
modern parallel programming in C++, describing the language and library
features, and providing brief examples of how to use them
Stellar Mergers with HPX-Kokkos and SYCL: Methods of using an Asynchronous Many-Task Runtime System with SYCL
Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity
of available accelerator cards within current supercomputers, portability is a
key aspect for modern HPC applications. In Octo-Tiger, we rely on Kokkos and
its various execution spaces for portable compute kernels. In turn, we use HPX
to coordinate kernel launches, CPU tasks, and communication. This combination
allows us to have a fine interleaving between portable CPU/GPU computations and
communication, enabling scalability on various supercomputers. However, for HPX
and Kokkos to work together optimally, we need to be able to treat Kokkos
kernels as HPX tasks. Otherwise, instead of integrating asynchronous Kokkos
kernel launches into HPX's task graph, we would have to actively wait for them
with fence commands, which wastes CPU time better spent otherwise. Using an
integration layer called HPX-Kokkos, treating Kokkos kernels as tasks already
works for some Kokkos execution spaces (like the CUDA one), but not for others
(like the SYCL one). In this work, we started making Octo-Tiger and HPX itself
compatible with SYCL. To do so, we introduce numerous software changes, most
notably an HPX-SYCL integration. This integration allows us to treat SYCL
events as HPX tasks, which in turn allows us to better integrate Kokkos by
extending the support of HPX-Kokkos to also fully support Kokkos' SYCL
execution space. We show two ways to implement this HPX-SYCL integration and
test them using Octo-Tiger and its Kokkos kernels, on both an NVIDIA A100 and
an AMD MI100. We find modest, yet noticeable, speedups by enabling this
integration, even when just running simple single-node scenarios with
Octo-Tiger where communication and CPU utilization are not yet an issue
- …